Sparsity-Based Estimation of a Panel Quantile Count Data Model with Applications to Big Data∗
نویسندگان
چکیده
In this paper we introduce a panel quantile estimator for count data with individual heterogeneity, by constructing continuous variables whose conditional quantiles have a one-to-one relationship with the conditional count response variable. The new method is needed as a result of the increased availability of Big Data, which allows us to track event counts at the individual level for a large number of activities from webclicks and retweets to store visits and purchases. At the same time, the presence of many different subpopulations in a large dataset requires us to pay close attention to individual heterogeneity. In this paper, we propose a penalized quantile regression estimator with fixed effects and investigate the conditions under which the slope parameter estimator is asymptotically Gaussian. We investigate solutions to the computational challenges resulting from the need to estimate tens of thousands of parameters in a Big Data setting and caution against penalizing in models with substantial zero inflation and endogenous covariates by using a series of Monte Carlo simulations. We present an empirical application to individual trip counts to the store based on a large panel of food purchase transactions. JEL: C21, C23, C25, C55.
منابع مشابه
Bayesian Quantile Regression with Adaptive Lasso Penalty for Dynamic Panel Data
Dynamic panel data models include the important part of medicine, social and economic studies. Existence of the lagged dependent variable as an explanatory variable is a sensible trait of these models. The estimation problem of these models arises from the correlation between the lagged depended variable and the current disturbance. Recently, quantile regression to analyze dynamic pa...
متن کاملA NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM
Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...
متن کاملThe examination of relationship between socioeconomic factors and number of tuberculosis using quantile regression model for count data in Iran 2010-2011
Background: Poverty and low socioeconomic status are the most important reasons of increasing the global burden of tuberculosis, not only in developing countries but also in developed countries for particular groups. The purpose of this study was to assess the association between socioeconomic factors and the number of tuberculosis patients using quantile regression for count data. Me...
متن کاملEstimation of Count Data using Bivariate Negative Binomial Regression Models
Abstract Negative binomial regression model (NBR) is a popular approach for modeling overdispersed count data with covariates. Several parameterizations have been performed for NBR, and the two well-known models, negative binomial-1 regression model (NBR-1) and negative binomial-2 regression model (NBR-2), have been applied. Another parameterization of NBR is negative binomial-P regression mode...
متن کاملBayesian Quantile Regression with Adaptive Elastic Net Penalty for Longitudinal Data
Longitudinal studies include the important parts of epidemiological surveys, clinical trials and social studies. In longitudinal studies, measurement of the responses is conducted repeatedly through time. Often, the main goal is to characterize the change in responses over time and the factors that influence the change. Recently, to analyze this kind of data, quantile regression has been taken ...
متن کامل